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Abstract 

An elementary construction of the Wiener process is discussed, based on a proper 
sequence of simple symmetric random walks that uniformly converge on bounded in- 
tervals, with probability 1. This method is a simplification of F.B. Knight's and P. 
Revesz's. The same sequence is applied to give elementary (Lebesgue-type) definitions 
of Ito and Stratonovich sense stochastic integrals and to prove the basic Ito formula. 
The resulting approximating sums converge with probability 1. As a by-product, new 
elementary proofs are given for some properties of the Wiener process, like the almost 
sure non-differentiability of the sample-functions. The purpose of using elementary 
methods almost exclusively is twofold: first, to provide an introduction to these topics 
for a wide audience; second, to create an approach well-suited for generalization and 
for attacking otherwise hard problems. 
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1 Introduction 

The Wiener process is undoubtedly one of the most important stochastic processes, both in 
the theory and in the applications. Originally it was introduced as a mathematical model 
of Brownian motion, a random zigzag motion of microscopic particles suspended in liquid, 
discovered by the English botanist Brown in 1827. An amazing number of first class scientists 
like Bachelier, Einstein, Smoluchowski, Wiener, and Levy, to mention just a few, contributed 
to the theory of Brownian motion. In the course of the evolution of probability theory it 
became clear that the Wiener process is a basic tool for many limit theorems and also a 
natural model of many phenomena involving randomness, like noise, random fluctuations or 
perturbations. 

The Wiener process is a natural model of Brownian motion. It describes a random, but 
continuous motion of a particle, subjected to the influence of a large number of chaotically 
moving molecules of the liquid. Any displacement of the particle over an interval of time as 
a sum of many almost independent small influences is normally distributed with expectation 
zero and variance proportional to the length of the time interval. Displacements over disjoint 
time intervals are independent. 

The most basic types of stochastic integrals were introduced by K. Ito and R. L. Straton- 
ovich as tools for investigating stochastic differential equations, that is, differential equations 
containing random functions. Not surprisingly, the Wiener process is one of the corner stones 
the theory of stochastic integrals and differential equations was built on. 

Stochastic differential equations are applied under similar conditions as differential equa- 
tions in general. The advantage of the stochastic model is that it can accommodate noise 
or other randomly changing input and effects, which is a necessity in many applications. 
When solving a stochastic differential equation one has to integrate a function with respect 
to the increments of a stochastic process like the Wiener process. In such a case the classical 
methods of integration cannot be applied directly because of the "strange" behaviour of the 
increments of the Wiener and similar processes. 

A main purpose of this paper is to provide an elementary introduction to the aforemen- 
tioned topics. The discussion of the Wiener process is based on a nice, natural construction 
of P. Revesz [6, Section 6.2], which is essentially a simplified version of F.B. Knight's [4, 
Section 1.3]. We use a proper sequence of simple random walks that converge to the Wiener 
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process. Then an elementary definition and discussion of stochastic integrals is given, based 
on [8], which uses the same sequence of random walks. 

The level of the paper is (hopefully) available to any good student who has taken a usual 
calculus sequence and an introductory course in probability. Our general reference will be W. 
Feller's excellent, elementary textbook [2]. Anything that goes beyond the material of that 
book will be discussed here in detail. I would like to convince the reader that these important 
and widely used topics are natural and feasible supplements to a strong introductory course 
in probability; this way a much wider audience could get acquainted with them. However, I 
have to warn the non-expert reader that "elementary" is not a synonym of "easy" or "short" . 

To encourage the reader it seems worthwhile to emphasize a very useful feature of ele- 
mentary approaches: in many cases, elementary methods are easier to generalize or to attack 
otherwise hard problems. 

2 Random Walks 

The simplest (and crudest) model of Brownian motion is a simple symmetric random walk 
in one dimension, hereafter random walk for brevity. 

A particle starts from the origin and steps one unit either to the left or to the right with 
equal probabilities 1/2, in each unit of time. Mathematically, we have a sequence Xi, X 2 , . . . 
of independent and identically distributed random variables with 

P {X n = 1} = P {X n = -1} = 1/2 (n = 1, 2, . . .), 

and the position of the particle at time n (that is, the random walk) is given by the partial 
sums 

S = 0, S n = X ± + X 2 + ■ ■ ■ + X n (n = 1, 2, . . .). (1) 

The notation X(n) and S(n) will be used instead of X n and S n where it seems to be advan- 
tageous. 

A bit of terminology: a stochastic process is a collection Z(t) (t e T) of random variables 
defined on a sample space Q. Usually T is a subset of the real line and t is called "time". 
An important concept is that of a sample-function, that is, a randomly selected path of a 
stochastic process. A sample-function of a stochastic process Z(t) can be denoted by Z(t; u>), 
where u G fi is fixed, but the "time" t is not. 

To visualize the graph of a sample-function of the random walk one can use a broken line 
connecting the vertices (n, S n ), n = 1,2,... (Figure 1). This way the sample-functions are 
extended from the set of the non-negative integers to continuous functions on the interval 
[0,oo): 

S(t) = S n + (t- n)X n+1 (n < t < n + 1; n = 0, 1, 2, . . .). (2) 
It is easy to evaluate the expectation and variance of S n : 

n n 

E(S n ) = E(X fc ) = 0, Var(S n ) = £ E(X 2 k ) = n. (3) 

k=l k=l 
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Figure 1: The graph of a sample-function of S(t). 

The distribution of S n is a linearly transformed symmetric binomial distribution [2, Sec- 
tion 111,2]. Each path (broken line) of length n has probability 1/2™. The number of paths 
going to the point (n,r) from the origin is equal to the number of choosing (n + r)/2 steps 
to the right out of n steps. Consequently, 

P{S n = r}= (, n w J— (Irl <n). 
1 1 \(n + r)/2)2 n Vl 1 ~ ; 

The binomial coefficient here is considered to be zero when n + r is not divisible by 2. 
Equivalently, S n = 2B n — n, where B n is a symmetric (p — 1/2) binomial random variable, 
P{B n = k} = (p-. 

An elementary computation shows that for n large, the binomial distribution can be 
approximated by the normal distribution, see [2, Section VII, 2]. What is shown there that 
for even numbers n = 2v and r = 2k, if n — > oo and |r| < K n = o(n 2 / 3 ), one has 

P{S n = r}=(, n w H=f 2U \± ^e- k2 ^ = 2h</ ) (rh), (4) 

where h = and <p{x) = (1 / 1 \phx)e~ x ' ^ 2 (— oo < x < oo), the standard normal density 

function. Note that for odd numbers n = 2v + 1 and r = 2k + 1 (4) can be proved similarly 
as for even numbers. 

Here and later we adopt the usual notations a n ~ b n for lim^oo a n /b n = 1 (a n and b n 
are asymptotically equal), and a n = o(b n ) for lim^oo a n /b n = 0. 

Equation (4) easily implies a special case of the central limit theorem and of the large 
deviation theorem, [2, Sections VII, 3 and 6]): 

Theorem 1 (a) For any real x fixed and n — >■ oo we have 

P{S n /V^ <x} ->$(a;), 

where <&(#) = (l/y/2n) f* O0 e~ u2 / 2 du (— oo < rr < oo) t/ie standard normal distribution 
function. 
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(b) If n — > oo and x n — > oo so that x n = o{n l l % ), then 

P [Sjy/n > x n ) ~ 1 - $(i n ), 

P {Sn/v 7 ^ < -^n} ~ H-X n ) = 1 - $(!„).□ 

For us the most essential statement of the theorem is that when x n goes to infinity 
(slower than n 1 / 6 ), then the two sides of (6) tend to zero equally fast, in fact very fast. For, 
to estimate 1 — $(x) for x large, one can use the following inequality, see [2, Section VII, 1], 

1 - < ^^ e - 2 /2 ( x > o) (5 ) 
x\/2ti 



r 















Thus fixing an e > 0, say e = 1/2, there exists an integer n$ > such that 

A < ^i±i e -^/ 2 < e-"/ 2 , (6) 
J x n y27i 

for n > no, whenever x n — >■ oo and x n = o(n 1 / 6 ) as n — > oo. It is important to observe 
that though S n can take on every integer from — n to n with positive probability, the event 
{\S n \ > x n y/n} is negligible as n — > oo. 

But what can we do if n does not go to oo, or if the condition x n = o(n 1 / 6 ) does not 
hold? Then a simple, but still powerful tool, Chebyshev's inequality can be used. A standard 
form of Chebyshev's inequality [2, Section IX, 6] is 

P{|X-E(X)|>t}<^^, 
for any t > 0, supposing Var(X) is finite. An other form that can be proved similarly is 

P{|X|>t}<5<™ (7) 

for any t > if E(X) is finite. If the kth moment of X, E(X fc ) is finite (k > 0), then one 
can apply (7) to \X\ k getting 



P{|X| > t} = P{\X\ k > t k } < 



E(\X\ 



for any t > 0. 

One can even get an upper bound going to exponentially fast as t — > oo if E(e MX ), the 
moment generating function of X, is finite for some u > 0. For then, by (7), 

P {X > t} = P {u X > u t} = P {e UoX > e Uot } < e- U{>t E(e u ° x ), (8) 

for any t > 0. 



5 



Analogously, if E(e u ° x ) is finite for some u > 0, then 

P {X < —t} = P {-u X > u t} = P [e- uoX > e ut,t } < e' uot E( e - Ut>x ), (9) 

for any t > 0. Combining (8) and (9), one gets 

P {\X\ > t} = P {X > t} + P {X < -t} < e~ uot (E(e uoX ) + E(e~ u ° x )) , (10) 

for any t > if the moment generating function is finite both at u and at — Mo- 

Now, it is easy to find the moment generating function of one step of the random walk: 

E{e uXk ) = e"(l/2) + e-"(l/2) = coshw. 

Hence, using the independence of the steps, one obtains the moment generating function of 
the random walk S n as 

n 

B(e uS ") = E(e u ^=i Xfc ) = E(J] e uX ") = (coshw)" (-00 < u < oo, n > 0). (11) 

k=i 

Since cosh-u is an even function and coshl < 2, (10) implies that 

P{|S„| >t}<2-2 n e~ t (t>0, n>0). (12) 

3 Waiting Times 

In the sequel we need the distribution of the random time r when a random walk first hits 
either the point x = 2 or —2: 

t = T\ = min {n : \S n \ = 2} . (13) 

To find the probability distribution of r, imagine the random walk as a sequence of pairs of 
steps. These (independent) pairs can be classified either as a "return": (1,-1) or (—1,1), 
or as a "change of magnitude 2": (1, 1) or (—1, —1). Both cases have the same probability 
1/2. 

Clearly, it has zero probability that r is equal to an odd number. The event {r = 2j} 
occurs exactly when j — 1 "returns" are followed by a "change of magnitude 2" . Because of 
the independence of the pairs of steps, P {r = 2j} = l/2 j . It means that r = 2Y, where Y 
has geometric distribution with parameter p = 1/2, 

P{r = 2j} = P{F=j} = l/2^ (j>1). (14) 

Hence, 

E(r) = 2E(y) = 2(1 /p) = 4, Var(r) = 2 2 Var(F) = 2 2 (1 - p)/p 2 = 8. (15) 
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An important consequence is that with probability 1, a random walk sooner or later hits 
2 or -2: 

oo 

P{r<oo} = E(V2 J ) = l. 

3=1 

It is also quite obvious that 

P {S(t) = 2} = P{S(r) = -2} = 1/2. (16) 

This follows from the symmetry of the random walk. If we reflect S(t) to the time axis, the 
resulting process S*(t) is also a random walk. Its corresponding r* is equal to r, and the 
event {S*(r) = 2} is the same as {S(t) = —2}. Since S*(t) is just the same sort of random 
walk as S(t), we have P {S*(r) = 2} = P {S(r) = 2} as well. 

Another way to show (16) is to use the fact that the waiting time r has countable many 
possible values and for any specific value we have symmetry: 

oo 

P{S(r) = 2} = £P{S(2j) = 2|r = 2j}P{r = 2j} 

j=i 

oo 

= ]T P {A 2j _ 2 , X 2j = X 2j _, = 1 | A 2j _ 2 , X 2j = X 2j _ ± } P{r = 2j} 

3=1 

oo 

= (l/2)£P{r = 2j} = l/2, 

J'=l 

where A 2j _ 2 denotes the event that each of the first j — 1 pairs is a "return", i.e. A 2j -_ 2 = 
{X 2 = — Xl, . . . , X 2 j_ 2 = — X 2 j_ 3 }, A = 0. 

We mention that (16) illustrates a consequence of the so-called optional sampling the- 
orem too: ES(r)) = 2P{S(r) = 2} + (-2)P{S(t) = -2} = 0, which is the same as the 
expectation of S(t). 

We also need the probability of the event that a random walk starting from the point 
x = 1 hits x = 2 before hitting x = — 2. This is equal to the conditional probability 
P {S(t) = 2 | X 1 = 1}. If Xx = 1, then X 2 = 1 with probability 1/2, and then r = 2 and 
S{t) = 2 as well: P {S(r) = 2, r = 2 | X = 1} = 1/2. 

On the other hand, if Xi = 1, then r > 2 if and only if X 2 = — 1, with prob- 
ability 1/2. that is, at the second step the walk returns the origin and starts "from 
scratch". Then by (16), it has probability 1/2 that the random walk hits 2 sooner than 
-2: P {S(t) = 2, r > 2 | X 1 = 1] - = 1/4. Therefore 

P {S(t) = 2 I X = 1} = P {S(t) = 2, r = 2 I X = 1} + P {S(t) = 2, r > 2 | X x = 1} 

= (1/2) + (1/4) = 3/4. (17) 

It also follows then that 

P {S(t) = -2 I Xi = 1} = 1 - (3/4) = 1/4. (18) 
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(16), (17), and (18) are special cases of ruin probabilities [2, Section XIV,2]. For example, 
it can be shown that the probability that a random walk hits the level a > before hitting 
the level — b < is bj (a + b). 

Extending definition (13) of r, for k — 1, 2, . . . we recursively define 

r k+1 = min {n : n > 0, \S(T k + n) - S(T k )\ = 2} , 

where 

T k = T(k) = n + r 2 + • • • + r fe . (19) 
Then each r k has the same distribution as r = T\. For, 

P{r k+1 = 2j\T k = 2m} 
= P{ min{n : n > 0, \S(2m + n) - S(2m)| = 2} = 2j | T fe = 2m} 
= P { min{n : n > 0, |S(n)| = 2} = 2j} = P {n = 2j} = 1/2 J ', 

where A; > 1, j > 1, and m > 1 are arbitrary. The second equality above follows from 
two facts. First, each increment S(2m + n) — S(2m) is independent of the event {T k = 2m}, 
because the increment depends only on the random variables Xj (2m + 1 < i < 2m+n), while 
the event {T k = 2m} is determined exclusively by the random variables Xi (1 < % < 2m), the 
corresponding "past". Second, each increment S{2m + n) — S{2m) has the same distribution 
as S(n), since both of them is a sum of n independent Xj. Hence, r k+ \ is independent of T k 
(and also of any T i: i < k), so indeed, P {r k+ i = 2j} = l/2 j (j > 1). 

We also need the distribution of the random time T k required by k changes of magnitude 
2 along the random walk. In other words, S(t) hits even integers (different from the previous 

one) exclusively at the time instants Ti,T 2 , To find the probability distribution of T k: 

imagine the random walk again as a sequence of independent pairs of steps, "returns" and 
"changes of magnitude 2" , both types having probability 1/2. The number of cases the event 
{T k = 2j} (j > k) can occur is equal to the number of choices of k — 1 pairs out of j — 1 
where a change of magnitude 2 occurs, before the last pair, which is necessarily a change of 
magnitude 2. Therefore 

Pm = 2j}= (j>k>l). (20) 

It means that T k = 2N k , where N k has a negative binomial distribution with p = 1/2, [2, 
Section VI,8]. 

All this also follows from the fact that N k = T k /2 is the sum of k independent, geo- 
metrically distributed random variables with parameter p = 1/2, see (14) and (19): N k = 
Yi + Y 2 + ■ ■ ■ + Y k (Yj = Tj/2). Then T k is finite valued with probability 1 and the expectation 
and variance of T k easily follows from (15) and (19): 

E(T fe ) = fcE(r) = 4k, Var(T fc ) = A;Var(r) = 8A;. (21) 
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It is worth mentioning that T k is a stopping time for each k > 1. By definition, it 
means that any event of the form {T k < j} depends exclusively on the corresponding "past" 
S(t) (t < j). In other words, S±, . . . , Sj determine whether {T k < j} occurs or not. 

Fortunately, the central limit and the large deviation theorems (see Theorem 1) can be 
proved for negative binomial distributions in the same fashion as for binomial distributions. 

Theorem 2 (a) For any real x fixed and k — > oo we have 

p iTk-Ak <x ) 
I VSk ~ J 

(b) If k — > oo and x k — > oo so that x k = o(k l l % ), then 

>x k \~l-i!>(x k ), 



'8k 

' ■ <-x fc l~$(-a; fc ) = l-$(a; fc ). 

Proof. The normal approximation (4) is applicable to negative binomial distributions 
too: if r = 2j and k — > oo, then 

0' - A 1 1 ( 3-1 \ 1 



p { T fe r > ^A: — 1 J 2J 2 ^-iw^-j-i;y 2J 



(j-l)+(2fc-j-l) j 9j -_! 

1 i / c-^r 

2 ^r(j - l)/2 ^ V (j-l)/2, 



1 / (r-4A; + 2) 2 \ . , 

supposing |2/c — j — 1| = o((j — 1) 2 ^ 3 ), or equivalently, 

|r — 4A:| = o(yfc 2/3 ). (23) 
A routine computation shows that (22) is asymptotically equal to 



1 (r-Ak 



when k — >■ oo and (23) holds. Therefore we get an analogue of (4): if /c — > oo and r is any 
even number such that \r — Ak\ < K k = o(k 2 ^), 

P {T k = r}~2h <f)((r - Ak)h), h = 1/V8k, (24) 

where denotes the standard normal density function. 
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Then in the same way as the statements of Theorem 1 are obtained from (4) in [2, Sections 
VII, 3 and 6] one can get the present theorem from (24). Here we recall only the basic step 
of the argument: 

pLi < Tk ~3 < x \ ~ J2 2h <j>((r - 4k)h) 

{ \/8k J {r: x 1 <(r-4k)h<x 2 , r is even} 



f X2 (j)(t) dt = Q(x 2 )-$(x 1 ), 



for any x±,x 2 , when k — > oo and so h — > 0. The simple meaning of this is that Riemann 
sums converge to the corresponding integral. □ 

In the same fashion as the large deviation inequality (6) was obtained for S n , Theorem 
2(b) and (5) imply a large deviation type inequality for T k : 



T k -4k 



8k 



(25) 



for k > ko, supposing %k — > oo and x k = o(k 1 ^ 6 ) as k — > oo. 

Like in case of S n , with too we need a substitute for the large deviation inequality if 
the assumptions k — > oo or x k = o(/c 1 / 6 ) do not hold. The moment generating function of r n 
is simple: 

oo i P 2n /9 1 

n ^ )= ^_ = _^ = __. (26) 

This function is finite if u < log y/2. Here and afterwards log denotes logarithm with base e. 
Now the moment generating function of T k follows from the independence of the r n 's as 

k k 

E(e uTk ) = E(e u S„=i^) = E(J] e UT ") = (2e~ 2u - \y k (u < \ogV2, k > 0). (27) 

n=l 

We also need the moment generating function of the centered and "normalized" random 
variable (T k — Ak)/\/S, whose expectation is and variance is k: 

E( e u(T k -4k)/V8-j = e -4ku/Vs E( e TfcM/v ^) = (2e u/ ^ - e uV2 y k , (28) 

for u < \/21og2 and k > 0. Since (28) is less than 2 k for u = ±1/2, the exponential 
Chebyshev's inequality (10) implies that 

P [\T k - Ak\/V% > t} < 2 • 2 k e~ t/2 (t > 0, k > 0). (29) 
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4 From Random Walks to the Wiener Process: 
"Twist and Shrink" 

Our construction of the Wiener process is based on P. Revesz's one, [6, Section 6.2], which 
in turn is a simpler version of F.B. Knight's [4, Section 1.3]. The advantage of this method 
over the several known ones is that it is very natural and elementary. 

We will define a sequence of approximations to the Wiener process, each of which is a 
"twisted and shrunk" random walk, a refinement of the previous one. It will be shown that 
this sequence converges to a process having the properties characterizing the Wiener process. 

Imagine that we observe a particle undergoing Brownian motion. In the first experiment 
we observe the particle exclusively when it hits points with integer coordinates j e Z. 
Suppose that it happens exactly at the consecutive time instants 1,2,.... To model the 
graph of the particle between the vertices so obtained the simplest idea is to join them by 
straight line segments like in Figure 1. Therefore the first approximation is 

B (t) = S (t) = S(t), 

where t > real and S(t) is a random walk defined by (1) and (2). 

Suppose that in the second experiment we observe the particle when it hits points with 
coordinates j/2 (j e Z), in the third experiment when it hits points with coordinates 
j/2 2 (j e Z), etc. To model the second experiment one idea is to take a second random walk 
S\(t), independent of the first one, and shrink it. 

Then the first problem that arises is the relationship between the time and space scales: 
if one wants to compress the length of a step into half, how much one has to compress the 
time needed for one step to preserve the essential properties of a random walk. Here we 
recall that by (3), the square root of the average squared distance of the random walk from 
the origin after time n is *Jn. So shrinking the random walk so that there are n steps in one 
time unit, each step should have a length Xj^fn. This way after one time unit the square 
root of the average squared distance of the walk from the origin will be around one spatial 
unit, like in the case of the original random walk. It means that compressing the length of 
one step into 1/2 (or in general: l/2 m ,m = 1,2, . . .) one has to compress the time needed 
for one step into 1/2 2 (in general: l/2 2m ). 

The second problem is that sample-functions of B (t) and of a shrunk version of an 
independent S\(t) have nothing to do with each other, the second is not being a refinement 
of the first in general. For example, if B (l) = 1, then it is equally likely that the first integer 
the shrunk version of Si(t) hits is +1 or —1. 

Hence before shrinking we want to modify Si(t) so that it hits even integers 2j (j e Z) 
(counting the next one only if it is different from the previous one) in exactly the same order 
as So(t) hits the corresponding integers j G Z. For example, if 5*0(1) = 1 and 5o(2) = 2, 
then the first even integer Si(t) hits should be 2 and the next one (different from 2) should 
be 4. Thus if Si(t) hits the first even integer at time Ti(l) and Si(Ti(l)) happens to be 
—2, we will reflect every step Xi(k) of Si(t) for < k < Ti(l). This way we get a modified 
random walk Si(t) up to time 71(1) so that 5i(7i(l)) = 2. Then we continue similarly up to 
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time Ti(2): if the (already modified) walk hit at time Ti(2) (instead of 4), then we would 
reflect the steps Xi(k) for Ti(l) < k < 7\(2). This modification process, which we will call 
"twisting" , ensures that the next approximation will always be a refinement of the previous 
one. 

Now let us see the construction in detail. It begins with a sequence of independent 
random walks S (t), Si(t), That is, for each m > 0, 

S m (0) = 0, S m (n) = X m {l) + X m {2) + ■■■ + X m (n) (n > 1), (30) 

where X m (k) (m > 0, k > 1) is a double array of independent, identically distributed random 
variables such that 

P {X m (k) = 1} = P {X m {k) = -1} = 1/2. (31) 

First we possibly modify Si(t), ^(i), . . . one-by-one, using the "twist" method to obtain 
a sequence of not independent random walks S\(t), S^t), . . ., each of which is a refinement 
of the former one. Second, by shrinking we get a sequence Bi(t),B 2 (t), . . . approximating 
the Wiener process. 

In accordance with the notation in (19), for m > 1, S m hits even integers (different from 
the previous one) exclusively at the random time instants 

T m (0) = 0, T m (k) = r m (l) + r m (2) + • • • + r m (k) (k > 1). 

Each random variable T m {k) has the same distribution as T{k) = T k above, see (20) and 
(21). That is, T m (k) is the double of a negative binomial random variable, with expectation 
4k and variance 8k. 

Now we define a suitable sequence of "twisted" random walks S m (t) (m > 1) recursively, 
using S m -i(t), starting with 

So(t) = S (t) (t > 0). 

First we set 

s m (o) = o. 

Then for k — 0, 1, . . . successively and for every n such that T m (k) < n < T m (k + 1), we take 
(Figures 2-4). 

X m (n) if S m {T m {k + l))-S m {T m {k)) = 2X m ^{k + l)- ^ 
—X m {n) otherwise. 

and 

S m (n) = S m (n - 1) + X m (n). (33) 

Observe that the stopping times T m {k) corresponding to S m (t) are the same as the original 
onesT m (A;) (m>0,A;>0). 

Lemma 1 For each m > 0, S m (t) (t > 0) is a random walk, that is, X m (l),X m (2), ... is a 
sequence of independent, identically distributed random variables such that 

P {X m (n) = 1} = P {X m (n) = -1} = 1/2 (n > 1). (34) 
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X 




Figure 2: B (t;u) = So(t;u). 



x 




Figure 5: Bi(t; u>). 
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Proof. We proceed by induction over m > 0. For m — 0, S (t) = S (t), a random walk 
by definition. So assume that S m -i(t) is a random walk, where m > 1, and see if it implies 
that S m (t) is a random walk too. 

It is enough to show that for any n > 1 and any e^- = ±1 (j = 1, . . . , n) we have 

P {X m (l) =£!,..., X m (n - 1) = e n _!, X m (n) = e n } = 1/2™ (35) 

Set the events A m>r = |x m (j) = ej, 1 < j < r j for 1 < r < n (A mj0 is the sure event by 
definition) and the random variables AS^ fc = S m (T m (k + 1)) — S m (T m (k)) for k > 0. The 
event A m>n _i determines the greatest integer k > such that T m (k) < n — 1; let us denote 
this value by /t. By (32), 

P {An,n} = E P {VIS ^m(n) = ^n, AS^ = tt2X m -l(^ + 1)} . (36) 
a=±l 

The event A mn _i can be written here as 5 mi n-iC min _i, where 

5 m ,„-i = {X m (j) = ej, 1 < j < T m (/c)} , 

C m , n _i = {X m (j) = aej, T m {n) + 1 < j < n - 1} . 

Definition (32) shows that -B m , n -i is determined by X m _i(j) (1 < j < k) and X m (j) (1 < 
J < r m («;)) the values of which do not influence anything else in (36). 
Then we distinguish two cases according to the parity of n. 

Case 1: n is odd. Then n — 1 is even and S m {T m (K)) = S m {n — 1). Further, let 
r mjr = min{j : j > 0, \S m (r + j) - S m (r)\ = 2} and AS m (r) = S m (r + r m , r ) - S m (r) for 
r > 0. Then S m (T m (K + 1)) = S m (n - 1 + r m , n _i) and AS^ K = AS m (n - 1). These and the 
argument above shows that in (36) A m ^\ is independent of the other terms. Consequently, 
(36) simplifies as 

P {A m , n } = 2P {A^J \ E P {X m (n) = e n ; AS m (n - 1) = 2/3} , (37) 

1 /3=±1 

since the value of a is immaterial and P |x m _i(/t + 1) = /?| = 1/2, independently of every- 
thing else here. 

Finally, (17) and (18) can be applied to (37): 

P {A mtn } = P {A^} E P {A5 m (n - 1) = 2/3 | X m (n) = ej P {X m (n) = e n } 

/3=±1 

= p{A m , n _ 1 }(^ + i)i = ip{A m , n _ 1 }, 
independently of e n . 
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Case 2: n is even. Then n — 2 is even and the argument in Case 1 could be repeated 
with n — 2 in place of n — 1, with the only exception that in (36) we have an additional term 
X m {n — 1) = aX m {n — 1). Then instead of (37) we arrive at 

= P{A m ^ 2 } P{X m (n-l) = e n - 1 ,X m (n) = e n ; AS m (n - 2) = 2/?} 
p=±i 

= P{A m , n _ 2 }^ J2 P{AS m (n-2)=2f3\X m (n-l)=e n ^,X m (n) = e n }. (38) 

2 /3=±1 

The conditional probability in (38) is 

1 if /3 = e n _i = e n ; if - £ = e n _i = e n ; 
1/2 if /3 = e n _i = -e n ; 1/2 if - /3 = e n _i = -e n . 

Thus the sum in (38) becomes 

1 + = 1 if e n ^=e n ; (1/2) + (1/2) = 1 if e n ^ = -e n . 

In other words, the value of the sum in (38) is 1, independently of e n _i and e n . 

In summary P {A m , n } = |P {A mj „_i} if n is odd and P {A m , n } = \P {A m , n _ 2 } if n is 
even. Since P {A mj0 } = 1, (35) follows. □ 

We mention that an other possibility to prove Lemma 1 is to introduce the random 
variables Z k = ^AS^ k _ 1 X m ^ 1 (k) for k > 1. It can be shown that Zi, Z 2 , . . . is a sequence 
of independent and identically distributed random variables, P {Z k = 1} = P {Z k = —1} = 
1/2, and this sequence is independent of the sequence X m (l), X m (2), ... as well. Then we 
have X m (n) = Z k X m (n) for each n such that T m (k — 1) < n < T m (k) (k > 1) and this 
implies (35). 

The main property that was aimed when we introduced the "twist" method easily follows 
from (32) and (33): 

S m (T m (k)) = Y.~S m {T m {j)) - S m (T m (j - 1)) = j^ZXm-iU) = ^S m ^(k), (39) 

3=1 3=1 

for any m > 1 and k > 0. 

Now the second step of the approximation comes: "shrinking" . As was discussed above, 
at the mth approximation the length of one step should be l/2 m and the time needed for a 
step should be l/2 2m (Figure 5). So we define the mth approximation of the Wiener process 
by 

B ™(^)=^S m (t) (t>0,m>0), (40) 

or equivalently, B m (t) = 2~ m S m (t2 2m ). Basically, B m (t) is a model of Brownian motion on 
the set of points x = j/2 m (j e Z). 
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Now (39) becomes the following refinement property: 



T m (k)\ _ / k \ 

D ' m [ 22m I D m-\ I 2 2(m-l) J ' ^ ^ 



for any m > 1 and k > 0. 



The remaining part of this section is devoted to showing the convergence of the sequence 
B m (t) {m = 0, 1, 2, . . .), and that the limiting process has the characterizing properties of the 
Wiener process. In proving these our basic tools will be some relatively simple, but powerful 
observations. 

First, often in the sequel the following crude, but still efficient estimate will be applied: 

P {max Z j > f } = P | jj {Z 3 > o| < E P {Zj > t} , (42) 

which is valid for arbitrary random variables Zj and real number t. 

The proofs of Lemmas 3 and 4 below essentially consist of the application of the following 
large deviation type estimate fulfilled by S n and (T& — 4A;)/y / 8 according to Theorems 1(b) 
and 2(b). The previously mentioned exponential Chebyshev's inequalities (12) and (29) will 
be also used. Note that in the next lemma we have a = 2 and b = 1 for S n in (12) and a = 2 
and b = 1/2 for (T k - Ak)/y/% in (29). 

Lemma 2 Suppose that for j > 0, we have E(Zj) = ; Var(Zj) = j, and with some a > 
and b > 0, 

P{\Zj\>t} < 2a j e~ bt (t > 0) 

(exponential Chebyshev-type inequality). 

Assume as well that there exists a jo > such that for any j > jo, 

Z;, \J >•'-,}<' 

whenever Xj — > oo and Xj = o{j 1 ^) as j — > oo (large deviation type inequality). 
Then for any C > 1, 

P {ma* \ Zj \ > ^2CN\o gN ) < ^ . (43) 

if N is large enough, N > N (C). 

Proof. The maximum in (43) can be handled by the crude estimate (42). Divide the 
resulting sum into two parts: one that can be estimated by a large deviation type inequality, 
and an other that will be estimated using exponential Chebyshev's inequality. For the large 
deviation part Xj will be y/2C log N. Since j < N, j — > oo implies that N — > oo, and then 
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Xj — > oo as well. If j > log 4 iV, then the condition Xj = c^j 1 / 6 ) holds too, and the large 
deviation type inequality is applicable. Thus 



P { max \ZA > J2CN\ogN\ 

[o<j<N 1 — V ° J 

l^N\ N f . 

< J2 2atexp(-bj2CNlogN)+ £ P \\Zj\lyfj > pClogN^ 



3=0 ' j=Llog 4 AfJ 

2a 



n-c 



< — ^ exp (log a log 4 N - b^2CN\ogN^j + N exp(-C log N) < 2 N 1 

if C > 1 and iV > N (C). ([x\ denotes the greatest integer less than or equal to x.) □ 

Note that the lemma and its proof are valid even when N is not an integer. Here and 
afterwards we use the convention that if the upper limit of a sum is a real value N, then the 
sum goes until |_^VJ • 

We mention that both inequalities among the assumptions of the previous Lemma 2 
hold for partial sums Zj of any sequence of independent and identically distributed random 
variables with expectation 0, variance 1 and a moment generating function which is finite 
for some ±«o- The fact that an exponential Chebyshev-type inequality should hold then can 
be seen from (10) and (11), while the large deviation type estimate is shown to hold e.g. in 
[3, Section XVI, 6]. 

The first Borel-Cantelli lemma [2, Section VIII, 3] is also an important tool, stating that 
if there is given an infinite sequence A 1: A 2) ... of events such that J2m=i P {An} is finite, 
then with probability 1 only finitely many of the events occur. Or with an other widely used 
terminology: almost surely only finitely many of them will occur. 

Now turning to the convergence proof, as the first step, it will be shown that the time 
instants T m+1 (k) /2 2( - m+ ^ will get arbitrarily close to the time instants k/2 2m = Ak/2 2{ - m+ ^ 
as m — > oo. By (41), this means that the next approximation not only visits the points 
x = j/2 m (j G Z) in the same order, but the corresponding time instants will get arbitrarily 
close to each other as m — > oo. Remember that by (20) and (21), T m {k) is the double of 
a negative binomial random variable, with expectation 4k and variance 8k. Here Lemma 2 
will be applied to (T m (k) - Ak)/y/% with N = K2 2m . So log T V = log i f + (2 log2)m < 1.5m 
if m is large enough, m > m (K), and then V / 2CN log iV < \j3CKm 2 m . 

Lemma 3 (a) For any C > 1, K > 0, and for any m > m (C, K) we have 

P I max |T m+ i(A;) - Ak\ > V2ACKm 2 m \ < 2 (K2 2m f- C . (44) 

[0<k/2 2 ™<K ) 



(b) For any K > 0, 

T m+ i(k) k 



max 

0<k/2 2m <K 



22(ra+l) 2 2m 

with probability 1 for all but finitely many m. 



< V2Km 2~ m (45) 
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Proof. 

(a) (44) is a direct consequence of Lemma 2. 

(b) Take for example C = 4/3 in (a) and define the following events for m > 0: 

A m = J max |T m+ i(fc) - 4fc| > VS2Km 2 m \ . 

[0<k/2 2m <K j 

By (44), for m > m (C, K), P {A m } < 2 (K2 2m )- 1 / 3 . Then E„ =0 P {A*} < oo. Hence the 
Borel-Cantelli lemma implies that with probability 1, only finitely many of the events A m 
occur. That is, almost surely for all but finitely many m we have 

max |T m+ i(A;) - Ak\ < V32Km 2 m . 

0<k/2 2m <K 

This inequality is equivalent to (45). □ 

It seems to be important to emphasize a "weakness" of a statement like the one in Lemma 
3(b): we use the phrase "all but finitely many m" to indicate that the statement holds for 
every m > m (w), where m (u) may depend on the specific point u of the sample space. In 
other words, one has no common, uniform lower bound for m in general. 

Next we want to show that for any j > 1, B n+ j(t) will be arbitrarily close to B n (t) 
as n — > oo. Here again Lemma 2 will be applied, this time to a random walk S r , with 
a properly chosen N' and C (instead of iV = K2 2m and C). Although the proof will be 
somewhat long, its basic idea is simple. Since B m+1 (T m+1 (k)/2 2(m+1 ^) = B m (k/2 2rn ) by (39), 
and the difference of the corresponding time instants here approaches zero fast as m — > oo 
by (45), one can show that B m (t) and its refinement B m+1 (t) will get very close to each other 
too. 

The following elementary fact that we need in the proof is discussed before stating the 
lemma: 

£ mT m l 2 = (1/V2) £ m (l/v^)™ < 4n2-"/ 2 , (46) 

m=n m=n 

for n > 15. This can be shown by a routine application of power series: 



m=n 



dx ^ n dx\l — xj \1 — x n(l — x) 2 j 

Substituting x = l/y/2, one gets (46) for n > 15. 

Lemma 4 (a) For any C > 3/2, K > 0, and for any n > n (C, K) we have 

{ o< max^ |S n+1 (T n+1 (^)/2 2 (" +1 )) - B n+1 (k/2 2n )\ > (l/8)n 2 -/ 2 | < 3{K2 2n f~ c (47) 



P 

and 



P {max \B n+j (t) - B n (t)\ > n2~ n/2 for some j > l| < Q(K2 2n f- c . (48) 
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(b) For any K > 0, 

max \B n+j (t) - B n (t)\ < n2^ 2 , 
with probability 1 for all j > 1 and for all but finitely many n. 



(49) 



Proof. Let us consider first the difference between two consecutive approximations, 
maxo<t<ic \ B rn+ i{t) — B rn {t)\. The maximum over real values t can be approximated by the 
maximum over dyadic rational numbers k/2 2m . This is true because any sample-function 
B m (t;u) is a broken line such that, by (40), the magnitude of the increment between two 
consecutive points k/2 2m and (k + l)/2 2m is equal to 2~ m . Thus, taking the integer t m = 
[t2 2rn \ for each t E [0, K\, one has t m /2 2m < t < (f m +l) /2 2m and so 4t m /2 2 ( m+1 ) < t < (At m + 
4)/ 2 2(m+D. So we get \B m (t) - B m (t m /2 2m )\ < 2~ m and \B m+1 {t) - B m+1 (4t m /2 2 ( m+1 ) 
< 4 . 2 -(™+ 1 ) = 2 • 2~ m . Hence 

max \B m+1 (t) - B m {t)\ < q< max^ |s m+1 (4A;/2 2 ( m+1 )) - £ m (k/2 2m ) 
Moreover, by (41) and (40) we have 

B m+1 (4k/2 2 ^) - B m (k/2 2m ) = B m+l (Ak/2 2 ^) - B m+l (T m+l (k)/2 2 ^] 

= 2-^S m+1 (Ak)-2-^S m+1 (T m+1 (k)). (50) 

Thus 



+ 3-2" 



max \B m+1 (t) - B m (t)\ > (l/4)m2 



-rre/2 



< P 



< max 

\0<k/2 2m <K 



P <^ max 

Lo<fc<X2 2 ' 1 



B m+1 (4A;/2 2 ( m+1 )) - B m (k/2 2m ) \ > (l/8)m2- m / 2 
S m+1 (4k) - S m+1 (T m+1 (k))\ > (l/4)m2 m / 2 } 



(51) 



if m is large enough. 

By Lemma 3, the probability of the event 



A rn = { max \T m+1 (k) - Ak\ > V24CKm 2 m \ 

is very small for m large. Therefore divide the last expression in (51) into two parts according 
to A m and A c m (the complement of A m ): 



max \B m+1 (t) -B m (t)\ > (l/4)m2 



-m/2 



< p{^; o< max 2m |5 m+1 (4/ C )-5 m+1 (T m+1 (A;))| > (l/4)m2 m / 2 } + P {A m } 



K2 2 



< E p 



max 



k=i 



[j: \j-4k\<V24CKrn2 m } 



S m+1 (Ak) - S m+1 (j) > (l/4)m2 m / 2 



+2{K2 2m ) 



2m\l-C 



(52) 
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where the crude estimate (42) and Lemma 3(a) were used. 

Now apply Lemma 2 to S m+ i(j) — S m+ i(4k) here, with suitably chosen N' and C . For 
k fixed and j > 4k, S m+ i(j) — S m+ i(4k) = Sr=4fc+i X m+ i(r) is a random walk of the form 
S(j — 4k). (The case j < 4k is symmetric.) Since \j — 4k\ < \j24CKm 2 m , N' is taken as 
V24CKm 2 m . (So N' is roughly y/N, where N = K2 2m .) Then log N' = (1/2) log(24OTm) 
+ (log2)m < m if m is large enough, depending on C and K. So 

y/2C'N'\ogN' < \/2C'mV24CK^2 m < (l/4)m2 m/2 , 
if m is large enough, depending on C, C, and X. Then it follows by Lemma 2 that 

pj max |S(r)| > (l/4)m2 m / 2 l < 2(V24CKm 2 m ) 1 ~ c ' . (53) 

[0<r<VMCKm2 m J 

The second term of the error probability in (52) is 2(K2 2m ) l - c = 2N l ~ c ', while (53) 
indicates that the first term is at most K2 2m ■ 2 ■ 2{y/QACKm 2 m ) 1 ~ c ' < N(^fN) l - c> if 
C > 1 and m is large enough. To make the two error terms to be of the same order, choose 
1 + (1 - C')/2 = 1 - C, i.e. C = 2C + 1. Thus (52) becomes 

P {max \B m+1 (t) - B m (t)\ > (l/4)m2- m / 2 } < ^K2 2m f- C , 

for any m large enough, depending on C and K. Comparing this to (50) and (51) one obtains 
(47). 

By (46), max < t <x \B m+1 (t) - B m (t)\ < {l/4)m2- m ' 2 for all m > n > 15 would imply 
t 



(47) 
I 

that 



max | B n+j (t)-B n (t)\= max £ B m+l (t) - B m (t) 

m=n 

n+j-1 oo 

< ]T max \B m+1 (t) - B rn (t)\ < ^ (l/4)m2- m / 2 < n2~"/ 2 , 

m=n — — m=n 

for any j > 1. So we conclude that 

f < max \B n+j (t) - S n (*)| > n2~" /2 for some j > 1 \ 

< £ P { max |fi m+1 (t) - B m (t)\ > (l/4)m2- m / 2 l 

m=n ^ ' 

oo 1 

< ]T 3(^2 2m )^ c = 3(ir2 2n ) 1 - f; < 6(K2 2n y- c 

m=n 1 2 

if C > 3/2 (say), for any n > n {C, K). This proves (48). 

The statement in (b) follows from (48) by the Borel-Cantelli lemma, as in the proof of 
Lemma 3. □ 
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Now we are ready to establish the existence of the Wiener process, which is a continuous 
model of Brownian motion. An important consequence of (49) is that the difference between 
the Wiener process and B n {t) is smaller than a constant multiple of log N/\/~N, where 
N = K2 2n , see (55) below. 

Theorem 3 As n — > oo, with probability 1 (that is, for almost all u G Q) and for all 
t G [0, oo) the sample-functions B n (t;u) converge to a sample- function W(t;u>) such that 

(i) W(0;u)) = 0, W(t;u) is a continuous function oft on the interval [0, oo); 

(ii) for any < s < t, W(t) — W(s) is a normally distributed random variable with 
expectation and variance t — s; 

(Hi) for any 0<s<t<u<v, the increments W(t) — W(s) and W(v) — W(u) are 
independent random variables. 

By definition, W(t) (t > 0) is called the Wiener process. 

Further, we have the following estimates for the difference of the Wiener process and its 
approximations. 

(a) For any C > 3/2, K > 0, and for any n > no(C,K) we have 



P 

(b) For any K > 0, 



| max \W(t) - B n (t)\ > n2- n ^ < 6(K2 2n )^ c . (54) 



max \W(t)- B n (t)\ <n2~ n l\ (55) 



with probability 1 for all but finitely many n. 

Proof. Lemma 4(b) shows that for almost all u G f2, the sequence B n (t;u) converges 
for any t > as n — > oo. Let us denote the limit by W(t; oj). On a probability zero cu-set 
the limit possibly does not exist, there one can define W(t; oo) = for any t > 0. Since 
B n (0; u>) — for any n, it follows that W(0; u>) — for any qj <E Q. 

Taking j — > oo in (48), (54) follows. By (49), the convergence of B n (t) is uniform on 
any bounded interval [0, if], more exactly, for any if > we have (55) with probability 
1. Textbooks on advanced calculus, like W. Rudin's [7, Section 7.12] show that the limit 
function of a uniformly convergent sequence of continuous functions is also continuous. This 
proves (i). 

Now we turn to the proof of (ii). Take arbitrary t > s > and x real. With K > t fixed, 
(54) shows that for any 5 > there exists an n > n (C, K) such that 

P {max \W(u) - B n (u)\ > 5^ < 5. (56) 

Since 

P {W(t) - W(s) <x} = P {B n {t) - B n {s) <x- (W(t) - B n (t)) + (W(s) - B n (s))} , 

(56) implies that 

P {B n (t) - B n (s) < x - 25} - 25 < P {W(t) - W(s) < x} 

< P {B n (t) - B n (s) < x + 25} + 25. (57) 
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This indicates that the distribution function of W(t) — W(s) can be eventually obtained 
from the distribution function of 

B n (t) - B n {s) = 2- n ~S n (2 2n t) - 2- n S n (2 2n s). (58) 

Take the non-negative integers j n = [2 2n t\ and i n = [2 2n s\, j n > i n . Then (58) differs 
from 

jn 

2- n (S n (jn)-S n (i n )) = 2- n *k (59) 

k=i n +l 

by an error not more than 2 • 2~ n < 5. (We can assume that n was chosen so.) Also, j n — i n 
differs from 2 2n (t — s) by at most 1. In particular, j n — i n — > oo as n — > oo. 

If n is large enough (we can assume again that n was chosen so), by Theorem 1(a), for 
any fixed real x' we have 

$(rr') - 5 < P | l _ , J2 X k <x'\< $(x') + 5. (60) 
Here V'in — can be approximated by 2 n ^/t — s if n is large enough, 



1-5 < J jn - 1 < ^fZ 8 < J 3 '" ~ l : + l <l + S. (61) 

V Jn In V Jn V Jn ^n 

Combining formulae (58)- (61) we obtain that 

$ ((1 - <*)— L= - ^ - 5 < P {73 n (t) - B n (s) < x} < $ ((1 + 5)^L= + ^ + 5. 

This shows that the distribution of B n (t) — B n (s) is asymptotically normal with mean and 
variance t — s as n — > oo. Moreover, by (57), the distribution of W(t) — W(s) is exactly 
normal with mean and variance t — s, since 5 can be made arbitrarily small if n is large 
enough: 

P {W(t) - W(s) < x} = $ (^y==j ■ 

This proves (ii). 

Finally, (iii) can be proved similarly as (ii) above. Taking arbitrary v>u>t>s>0 
and x, y real numbers, 

P {W{t) - W(s) < x, W(v) - W{u) < y} (62) 
can be approximated by a probability of the form 

P {B n (t) - B n {s) < x , B n (v) - B n {u) < y} 
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arbitrarily well if n is large enough, just like in (57). In turn, like in (59), the latter can be 
estimated arbitrarily well by a probability of the form 

f 1 3n i r n ] 

P E X k < x' , — == £ X fe < y' , (63) 

[ V J" *n fc=j n+ l V r " In k=q n +l ) 

where i n = [2 2n s\ < j n = [2 2n t\ < q n = [2 2n u\ <r n = [2 2n v\ . 

Since there are no common terms in the first and the second sum of (63), the two sums 
are independent. Thus (63) is equal to 

[vJn l n k=in+ i J [V'n <Jn k=Qn+l J 

which can be made arbitrarily close to 

P {W{t) - W(s) < x} ■ P {W(v) - W{u) < y} . (64) 

Since errors in the approximations can be made arbitrarily small, (62) and (64) must agree 
for any real x and y. This proves (iii). □ 

Note that properties (ii) and (iii) are often rephrased in the way that the Wiener-process 
is a Gaussian process with independent and stationary increments. It can be proved [4, 
Section 1.5] that properties (i), (ii), and (iii) characterize the Wiener process. In other 
words, any construction to the Wiener process gives essentially the same process that was 
constructed above. 



5 From the Wiener Process to Random Walks 

Now we are going to check whether the Wiener process as a model of Brownian motion has 
the properties described in the introduction of Section 4. Namely, we would want to find 
the sequence of shrunk random walks B m (k2~ 2m ) in W(t). 

Let s(l) be the first (random) time instant where the magnitude of the Wiener process 
is 1: s(l) = min{s > : |W^(s)| = 1} • The continuity and increment characteristics of the 
Wiener process imply that s(l) exists with probability 1. Clearly, each shrunk random walk 
B m (t) has the symmetry property that reflecting all its sample-functions to the time axis, 
one gets the same process. W(t) as a limiting process of shrunk random walks inherits this 
feature. Therefore setting X(l) = W(s(l)), P {X(1) = 1} = P{X(1) = -1} = 1/2. 

Inductively, starting with s(0) = 0, if s(k — 1) is given, define the random time instant 

s(Jfe) = min {s : s > s(k - 1), \W(s) - W(s(k - 1))| = 1} (k > 1). 

As above, s(k) exists with probability 1. Setting X(k) = W(s(k)) — W(s(k — 1)), it is 
heuristically clear that P {X(k) = 1} = P {X(k) = —1} = 1/2, and X(k) is independent of 
X(1),X(2),...,X(A;-1). 
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This way one gets a random walk S(k) = W(s(k)) = X(l) + X(2) + • --X(k) (k > 0) 
from the Wiener process. Using a more technical phrase, by this method, based on first 
passage times, one can imbed a random walk into the Wiener process; it is a special case of 
the famous Skorohod imbedding, see e.g. [1, Section 13.3]. 

Quite similarly, one can imbed B m (k2~ 2m ) into W(t) for any m > by setting s m (0) = 0, 

s m (k) = min {s : s > s m (k - 1), \W(s) - W(s m (k - 1))| = 2~ m ) (k > 1), (65) 

and B m (k2- 2m ) = W(s m (k)) (k > 0). 

However, instead of proving all necessary details about Skorohod imbedding briefly de- 
scribed above, we will define an other imbedding method better suited to our approach. It 
will turn out that our imbedding is essentially equivalent to the Skorohod imbedding. 

Our task requires a more careful analysis of the waiting times T m (k) first. Recall the 
refinement property (41) of B m (t). Continuing that, we get 

B m (k2- 2m ) = B m+1 (2~ 2 ^T m+1 (k)) = B m+2 (2- 2 ^T m+2 (T m+1 (k))) = ■■■ 

= B n (2~ 2n T n (T n _ 1 (---(T m+1 (k)) •••))), (66) 

where k > and n > m > 0. In other words, B n (t), n > m, visits the same dyadic points 
k2~ m in the same order as B m (t), only the corresponding time instants can differ. 
To simplify the notation, let us introduce 

T m , n {k) = T n (T n _i(- ■ ■ (T m+1 (k)) ■■■)) (n > m > 0, k > 0). 

Then (66) becomes 

B m (k2~ 2m ) = B n (2~ 2n T m , n (k)) (n > m > 0, k > 0). (67) 

The next lemma considers time lags of the form 2~ 2n T m ^ n (k) — k2~ 2m . 

Note that in the proofs of the next two lemmas we make use of the following simple in- 
equality, valid for arbitrary random variables Zj, real numbers tj, and events Aj = {Zj > tj}: 

P {Zj > tj for some j > 1} = P j Q Aj j 
= P {A 1 } + P {A\A 2 } + ■ ■ ■ + P {At ■ ■ ■ A<jA J+1 } + ■■■ 

oo 

< P{Z 1 >t 1 } + Y / P{Z J <t 3 ,Z J+1 >t 3+1 }. (68) 

3=1 

Lemma 5 (a) For any C > 3/2, K > 0, and m > take the following subset of the sample 
space: 

A rn = { max \2- 2n T mn (k) - k2~ 2m \ < Vl8CKm2~ m for all n > m) . (69) 



24 



Then for any m > m (C, K) we have 

P {AJ > 1 - 4{K2 2m y- c . (70) 

(b) For any K > 0, 



max \2- 2n T mn (k) - k2~ 2m \ < ^PXJK^i 2~ m 

with probability 1 for all n > m, for all but finitely many m. 

PROOF. Take any p, 1/2 < p < 1, and K' > K; for example p = 2/3 and K' = (4/3)if 
will do. Set a, = 1 + p + f3 2 + ■ ■ ■ + (3 j for j > 0, 

Z n = max \T min (k)-k2 2 ^ n -^\, t n = a n ^ 1 V2ACK 7 ^ 2 2n ~ m ~ 2 , 

Q<k2- 2m <K 

and F n+ i = max < fc2 - 2m <^ |T n+1 (T mi „(£;)) - 4T m>n (fc)| for n > m > 0. 
First, by Lemma 3(a), 

P{Z m+1 >t m+1 } = p{ max |T m+ i(fc) - 4fc| > y/2ACK'm2 m \ < 2(K2 2m f- c 

if m is large enough. 

Second, by the triangle inequality, Z n+1 < 4Z n + Y n+1 for any n> m. So 

P {Z n < t n , Z n+ i > t n+ i] < P \Z n < t n , Y n+ i > t n+ i — 4t n } . 

If Z n < t n , then setting j = T m ^ n (k), 

■ 2 -2n < 2 -2n^ 2 2(n-m) + Q = k2 ~2 m + ^^^24^771 2" m " 2 

< if + 3y/2CKm 2~ m < (4/3) K = if' 

holds for m > m (C,K), since a r < 1/(1 — pi) = 3 (if (3 = 2/3) for any r > 0. Applying 
these first and Lemma 3(a) last, for n > m > m (C, K) we get that 

P {Z n < t n , Z n+ i > t n+ i} 

< P j^.max^ |T n+1 (j) - 4j\ > V24CK'm TT~ m (a n „ m - a n _ m _!) j 

< P | 0< max^ |T„ +1 (j) - 4j| > v / 240^2 n | < 2{K'2 2n ) x - c . 

In the second inequality above we used that y/m2 n ~ m (a n - m — ct n _ m _i) = (2/3) n_m v /m > -^/n, 



which follows from the inequality (2/3) n m = (4/3) n m > yl + (n — m)/m, valid for any 
n > m > 2 (if /3 = 2/3). 
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Combining the results above, 



P ( max \2- 2n T mn (k) - k2~ 2m \ > Vl8CKm2' m for some n > m) 
= p( max \T mn (k) - k2 2 ^- m) \>3\ / 24CK 7 ^2 2n ^ m - 2 for some n > ml 

U<A;2- 2m <^ ' J 

< P {Z n > t n for some n > m + 1} 

oo 

< p {z m+1 > t m+1 } + ^2 p {z n < t n , z n+1 > t n+ i} 

n=m+l 

oo 1 

< ^{K2 2n f- C = 2(K2 2m f- c < A{K2 2m f- c 

n=m 1 2 

if C > 3/2, say. This proves (a). 

The statement in (b) follows by the Borel-Cantelli lemma. □ 

As (67) shows, B n (2- 2n T m , n (k)) = B m (k2~ 2m ) for any k > and for any n > m > 0. 
A natural question, important particularly when looking at increments during short time 
intervals, is that how much time it takes for B n (t) to go from the point B m ((k — l)2~ 2m ) 
to its next "m- neighbor' B m (k2~ 2m ) . Is this time significantly different from 2~ 2m for large 
values of ml Introducing the notation 

r m , n (k) = T mjJl (k) - T m , n {k - 1) (k > 1, n > m > 0), (71) 

the nth time differences of the m-neighbors are 2~ 2n T m>n {k) {k > 1). Note that T mn {k) = 
SjLi T m,n(j), where, as can be seen from the construction and the argument below, the terms 
are independent and have the same distribution. 
Let us look at r m ^ n (k) more closely. If n — m + 1, 

r m , m+ i{k) = T m+1 {k) - T m+1 {k - 1) = r m+1 (k), (72) 

which is the double of a geometric random variable with parameter p = 1/2, see (14). That 
is, 2~ 2 ( m+1 V m+ i(/c) is the length of the time period that corresponds to the time interval 
\{k — l)2~ 2m , k2~ 2rn ] after the next refinement of the construction. 

Similarly, each unit in r m+ i(k) will bring some r m+2 (r) "offsprings" after the following 
refinement, and so on. Hence if n > m is arbitrary, then given T m ^ n (k — 1) = j for some 
integer j > 0, we have 

(k) 

r m , n +i(k) = T n+1 (j + T mtU (k)) - T n+1 (j) = J2 T n+i(j + r). (73) 

r=l 

For given r m n (/c) = s (s > 0, even) its conditional distribution is the same as the distribution 
of a random variable T s which is the double of a negative binomial random variable with 
parameters s andp = 1/2, described by (19) and (20). Note that this conditional distribution 
of r mtTl+ i(k) is independent of the value of T m)Tl {k — 1). 
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Though we will not explicitly use them, it is instructive to determine some further prop- 
erties of a "prototype" r m>n = T m>n (\). A recursive formula can be given for its expectation 
by (73) and the full expectation formula: 

= E(r mjn+ i) = E(E(r mjn+ i | T m>n )) = E(r mjn E(r n+ i(r))) = 4u n . 

Since /i m +i = E(r m+ i(r)) = 4, it follows that 

f^n = E(T mjn ) = 2 ^ \ 

This argument also implies that 

E(2 ^ + ^T~m,n+l I 2 Tm,n) = 2 7~m,n- 

These show that the sequence (2 -2n T mjTl )^l m+1 is a so-called martingale. Therefore a famous 
martingale convergence theorem [1, Section 5.4] implies that this sequence converges to a 
random variable t m as n — > oo, with probability 1, and t m has finite expectation. 
We mention that a similar recursion can be obtained for the variance that results 

Var(2- 2 V m , n ) < V 4m 

The next lemma gives an upper bound for the nth time differences of the m-neighbors 
by showing that during arbitrary many refinements, they cannot be "much" larger than 
h = 2~ 2m , the original time difference of the m-neighbors. More accurately, they are less 
than a multiple of h 1_s , where S > arbitrary. 

Lemma 6 (a) For any K > 0, 5 such that < S < 1, and C > 2/5 we have 

P ( max \2- 2n T mn (k) - 2- 2m \ > 3C2- 2m ^- 5) for some n > m) < —T 2m ^ c - 2 \ 

[l<k2- 2m <K ' J 10 

(b) For any K > 0, and 5 such that < 5 < 1, 

max \2- 2n r m , n (k) - 2~ 2m \ < V 2 ™* 1 "*), 

l<k2~ 2m <K 5 

with probability 1 for all n > m, for all but finitely many m. 

PROOF. This proof is very similar to the proof of Lemma 5. Take any (3, 1/2 < (3 < 1; 
for example (3 = 2/3 will do. Set otj = 1 + (3 + (3 2 + • • • + (3 3 1 for j > 0. For any m > 0, 
consider an arbitrary k, 1 < k < K2 2m . (The distribution of T m , a (k) does not depend on k.) 
Let 

7 — \ T (U) _ o2(n-m)| f _ rj2 2Sm 2 2 ^ n ~ m ' ) 

and Y n+ i = \T m;n+ i(k)) — 4r mn (A;)| for n > m > 0. We want to apply inequality (68). 
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First take n = m+1. By (72), \T, m , m+ i(k) is a geometric random variable with parameter 
p = 1/2. Then 

P{Z m+1 >t m+1 } = P{|r m , m+1 (A;)-4| > 4C2 25m } 

= p|ir m+1 (A;)>2 + 2C2 25m }<2- 



ACSm 



because of the basic property of the tail of a geometric distribution. 

Second, let n > m be arbitrary. By the triangle inequality, Z n+1 < 4Z n + Y n+1 . So we 
obtain 

P {Z n < t n , Z n+ i > t n+ i} < P {Z n < t n , Y n+ \ > t n +i — 4t„} 

< Yl P - 4s| > t n+1 - At n | T m , n (k) = s} P {r m:n (k) = s} 

s=l 

< p] max |T s -4 S |/ v / 8>(Wi-4t n )/v / 8l EP{r m , n (A;) = S }, (74) 

where we applied (73) and the conditional distribution of r m ^ n+ i{k) mentioned there. 

The sum in (74) is 1. Therefore we want to estimate the probability of the maximum 
there by using Lemma 2 with N = ^C2 2&m 2 2{n ~ m \ This N is larger than 2 2{n - m ^ + t n if m is 
large enough, depending on 5. (Remember that cxj < 3 for any j > if (3 — 2/3.) To apply 
Lemma 2 we have to compare ^2CN log N to the right hand side of the inequality in (74): 



y/2CN\ogN /(n-m)log4 + m51og4 + log(4C)' 



1/2 



(t n+1 -4t n )/V8 V (4/3)2(n-m) 2 2Sm J > 

which is less than 1 for all n > m if m is large enough, depending on 5. 
Thus Lemma 2 gives that 

P{Z n <t n ,Z n+1 >t n+1 } < P jmax \T S - 4^/^ > pCN log TV J 

<^ 2iV^ — ^ 2^(j2, 2 '* m 2 2 ^ n ~ m ^^~ < ~' 

for all n > m as m > mo(5, C). 

Summing up for n > m, we obtain the following estimate for any given /c, 1 < k2~ 2m < K, 
as m > m (5, C): 

P {\2- 2n r m:n (k) - 2- 2m \ > 3C2- 2m{1 - 5) for some n > m) 

< P {Z n > t n for some n > m + 1} 

oo 

< p {z rn+ i > t m+ i} + y] p {z n < t n , z n+ i > t n+ i} 

n=m+l 

oo 

< 2- 4C5m _|_ 2 25< - 1 ~ c *- ),n 2(4C) 1 ~ c ' 2 2 ( 1 ~ c )( n - m ) 

n=rra+l 

< (l/10)2- 2m(5c - 1} , 
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where we took into consideration that < 5 < 1, C > 2, 5C > 2 by our assumptions. 

Finally, the statement in (a) can be obtained by an application of the crude inequality 
(42), 

P ( max \2- 2n T mn (k) - 2- 2rn \ > 3C2- 2m{1 - &) for some n > ml < K2 2m — 2~ 2m ( 5 °- 1 \ 

[l<k<K2 2m ' J 10 

as m > m (5, C), which is equivalent to (a). 

The statement in (b) follows by the Borel-Cantelli lemma with C = 7/(35) (say). □ 

Now we define a certain imbedding of shrunk random walks B m (k2~ 2m ) into the Wiener 
process W(t). 

Lemma 7 (a) For any C > 3/2, K' > K > 0, and any fixed m > m (C, K, K') there exist 
random time instants t m (k) G [0, K'\ such that 

P {W(t m (k)) = B m (k2- 2m ), < k2- 2rn <K)>1- 4{K2 2m Y~ c , 

where 

P ( max |t m (Jfe) - k2- 2m \ > Vl8CKm2~ m \ < A(K2 2m f~ c . (75) 
Moreover, if 5 is such that < 5 < 1, C > 2/5, and m > mi(5, C, K, K'), then we also have 

P ( max \t m (k) - t m (k - 1) - 2~ 2m \ > 3C2~ 2mil ~ s) \ < —2~ 2m ^ c - 2 ) + A(K2 2m f- c . 

ll<k2~ 2m <K J 10 

(76) 

(b) With probability 1, for any K' > K > 0, < 5 < 1, and for all but finitely many m 
there exist random time instants t m (k) G [0, K'\ such that 

W(t m (k)) = B m (k2' 2m ) (0 < k2~ 2m < K), 

where 



and 



max \t m (k) - k2~ 2m \ < V27 Km 2~ m , 

0<k2- 2m <K 



max \t m (k) - t m (k - 1) - 2~ 2m \ < (7/5)2^ 2m ^- s \ 

l<k2~ 2m <K 



Proof. By Lemma 5(a), fixing an m > m (C, K, K'), on a subset A m of the sample 
space with P {A m } >p m = l- A(K2 2m ) l ~ c ', one has 

max \2- 2n T mn (k) - k2~ 2m \ < Vl8CKm2- m , (77) 

0<k2-' 2m <K ' 

for each n > m. In particular, the time instants 2~ 2n T m ^ n (k) are bounded from below by 
and from above by K + \Jl&CKm 2~ m < K'. (Assume that m (C, K, K') is chosen so.) 

Applying a truncation t* m>n (k) = min {K', 2~ 2n T m . n {k)}, for each k, < k2- 2m < K, 
we get a sequence in n bounded over the whole sample space, equal to the original one 
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for <jj E A m . It follows from the classical Weierstrass theorem [7, Section 2.42], that every 
bounded sequence of real numbers contains a convergent subsequence. To be definite, let us 
take the lower limit [7, Section 3.16] of the sequence: 




(78) 



Then t m (k) e [0,K'\. 

By Theorem 3, with probability 1 the sample-functions of B n (t) uniformly converge to 
the corresponding sample-functions of the Wiener process that are uniformly continuous on 
[0, K']. (A continuous function on a closed interval is uniformly continuous [7, Section 4.19].) 
Thus (67) implies that for each k, < k2~ 2m < K, we have W(t m (k)) = B m {k2~ 2m ), with 
probability at least p m (on the set A m where the truncated sequences coincide with the 
original ones). 

To show it in detail, take any e > 0, any k (0 < k2~ 2m < K), and a subsequence t* mn .(k) 
converging to t m {k) as i — > oo. Then 



\W(t m (k)) - B m (k2- 2m )\ = \W(t m (k)) - B ni (2- 2n *T m , ni (k))\ 

< \W(t m (k)) - W(2-*«T mtni (k))\ + \W(2-*«T mtni (k)) - B ni (2- 2n >T mini (k))\ 

< e/2 + e/2 = e, 

where the last inequality holds on the set A m , for all but finitely many n«. Since e was 
arbitrary, it follows that \W(t m (k)) - B rn {k2- 2m )\ = on A m . 



Further, taking a limit in (77) with n = rij as % — > oo (on the set A m ), one obtains (75). 
Also, taking a similar limit in Lemma 6(a), 2~ 2n T m ^ ni {k) — > t m (k) —t m (k — 1) on the set A m , 
and (76) follows. 

The statements in (b) can be obtained similarly as in (a), applying Lemmas 5(b) and 
6(b), or from (a) by the Borel-Cantelli lemma. □ 

We mention that for any k > and m > 0, the sequence 2~ 2n T m , n (k) in fact converges 
to t m (k) with probability 1 as n — > oo. However, a "natural" proof of this fact requires the 
martingale convergence theorem mentioned above, before Lemma 6, a tool of more advanced 
nature than the ones we use in this paper. 

Next we want to show that the random time instants s m (k) of the Skorohod imbedding 
(65) and the t m (/c)'s defined in (78) are essentially the same. This requires a recollection of 
some properties of random walks. 

We want to estimate the probability that with given positive integers j, x, u and r a 
random walk Si goes from a point \Sj\ = x to = x + y so that \Sj + i\ < x + y while 

1 < % < k for some y < r and k > u, where k, y, and % are also positive integers. 

The first passage distribution given in [2, Section 111,7] can be applied here: 



P{S = 0,S t <y (l<t<k),S k = y} = ^ 




Hence, by Theorem 1 



P{|<S* 



x, \Sj + i\ < x + y (1 < i < k), \Sj + k\ = x + y for some y < r} 
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< y y -( k V* 

< (l + e)^($(r/v^)-$(0)) 



where e > is arbitrary, say equals 1, and k > k . 

So the larger the value of k is, the smaller estimate of the probability we get. Thus for 
all positive integers j, x, r, and u > k , 

P {\Sj\ — x, \ Sj+i\ <x + y(l<i<k), \Sj+k\ = x + y for some y < r, k > u} < r 2 /u 3 ^ 2 , 

(79) 

independently of the values of j and x. 

Theorem 4 The stopping times s m (k) (k > 0) of the Skorohod imbedding are equal to the 
time instants t m (k) of the imbedding defined in Lemma 7 on the set A m of the sample space 
given by (69), with the possible exception of a zero probability subset. 
Therefore all statements in Lemma 7 hold when s m {k) replaces t m (k). 

Proof. Fix an m > m (C, K, K'), where m (C, K, K') is the same as in Lemma 7. Let 
the subset A m of the sample space be given by (69). 

Take k — 1 first. Since s m (l) is the smallest time instant where |VF(t)| is equal to 2~ m , 
and |W(i m (l))| = 2 _m on the set A m , it follows that s m (l) < t m (l) on A m . We want to 
show that on A m the event {s m (l) < t m (l)} has zero probability. 

Indirectly, let us suppose that 5 m = t m (l) — s m (l) > on a subset C m of A m with 
positive probability. By (67), the first time instant where \B n (t)\ equals \B m (2~ 2m )\ = 2~ rn 
is 2' 2n T m , n {l) (n > m). So \B n (t)\ < 2~ m if < t < 2- 2n T m) „(l). On the other hand, by 
(55), 2~ m -n2~ n/2 < \B n (s m (l))\ < 2~ m for n > Ni(u) on a probability 1 w-set. (Remember 
that \W(s m (l))\ =2~ m .) 

Since 5 m > on the set C m , there exists an N 2 (uj) such that n2~ n l 2 < 5 m /2 for n > N 2 (co). 

By (78), t m (l) = liminf^oo 2~ 2n T m ^ n (l) on the set A m . The properties of the lower limit 
[7, Section 3.17] imply that on the subset C m there exists an N 3 (u>) such that 2~ 2n T m ^ n (l) > 
t m (l) -5 m /2for n> N 3 (u). 

Set N(u) = raax{N 1 (co),N 2 (uj),N 3 (uj)} for u E C m . Since B n (t) = 2- n S n (t2 2n ), the 
statements above imply that on the set C m the random walk S n (t) have the following prop- 
erties for n > N(u): 

(a) \S n (s m (l)2 2n )\ > 2 n ~ m -n2 n / 2 , 

(b) \S n {t)\ < 2" for s m (l)2 2n < t < T m , n (l), where T ro>n (l) - s m (l) > (5 m /2)2 2n > 
n2 3 ™/ 2 , 

(c) \S n (T m , n (l))\=2 n - m . 

Let D mn denote the subset of C m on which (a), (b), and (c) hold for a fixed n. Since 
D m ,n C D m>n+ i for each n, by the continuity property of probability [7, Section 11.3], we 
have lim^oo P {D m>n } = P {C m } > 0. This implies that there exists an integer no such that 
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P{An,n} > |P {C m } > holds for all n > n (say). In other words, for all large enough 
values of n, the probability of the event that (a), (b), and (c) hold simultaneously is larger 
than a fixed positive number. 

To get a contradiction, we apply (79) to S n (t), with r = n2 n ^ 2 and u = n2 3n / 2 . Theorem 
1, that was used to deduce (79), still applies since r = o(w 2 / 3 ), i.e. r / ' y/u = o(u 1 ^). Now the 
first passage time when ISV^t)] hits 2 _2m is T m , a (l). Thus the probability that S n (t) satisfies 
(a), (b), and (c) simultaneously is less than or equal to 

r 2 (n2"/ 2 ) 2 yfn_ 
u 3 / 2 ~ (n2 3n / 2 ) 3 / 2 ~~ 2 5 ™/ 4 ' 

which goes to zero as n — > oo. This contradicts the statement above that for all large enough 
value of n, the event that (a), (b), and (c) hold has a probability larger than a fixed positive 
number. This proves the lemma for k — 1: s m (l) = t m (l) on the set A m , with the possible 
exception of a zero probability subset. 

For k > 1, one can proceed by induction. Assume that s m {k — 1) = t m (k — 1) holds on A m 
except possibly for a subset of probability zero. The proof that then s m (k) = t m (k) holds as 
well is essentially the same as the proof of the case k — 1 above. It is true because on one hand 
s m {k) is defined recursively in (65), using s m (k — l), the same way as s m (l) is defined. On the 
other hand, by (71), T mn (k) = T m ^ n (k — 1) + r mn (/c), where the T m ^ n (k) is defined the same 
way as r m ,„(l) = T m>n (l). Also, remember that on the set A m , t m (j) = lim inf^oo T m<n (j) 
for j = k — 1 or j = k. □ 



6 Some Properties of the Wiener Process 

Theorem 3 above indicates that the sample-functions of the Wiener process are arbitrarily 
close to the sample-functions of B n (t) if n is large enough, with probability 1. The sample- 
functions of B n (t) are broken lines that have a chance of 1/2 to turn and have a corner at 
any multiple of time l/2 2n , so at more and more instants of time as n — > oo. Moreover, the 
magnitude of the slopes of the line segments that make up the graph of B n (t) is 

1/2" 

2 — > oo as n — >• oo. 



1/2 2 * 

Therefore one would suspect that the sample-functions of the Wiener process are typically 
nowhere differentiable. As we will see below, this is really true. Thus typical sample- 
functions of the Wiener process belong to the "strange" class of the everywhere continuous 
but nowhere differentiable functions. 

Theorem 5 With probability 1, the sample- functions of the Wiener process are nowhere 
differentiable. 

Proof. It suffices to show that with probability 1, the sample-functions are nowhere 
differentiable on any interval [0, K\. Put K = (3/2)K > (say). Then with probability 
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1, for all sample-functions and for all but finitely many m there exist time instants t m (k) 
(0 < k2~ 2m < K ) with the properties described in Lemma 7(b). In particular, 

max t m (k) >K - J27K m 2~ m > K 

0<k2- 2m <K v 

if m is large enough. 

Fix an oo in this probability 1 subset of the sample space. This defines a specific sample- 
function of W(t) and specific values of the random time instants t m (k). (To simplify the 
notation, in this proof we suppress the argument u.) Then choosing an arbitrary point t G 
[0, K], for each m large enough, one has t m (k — l)<t< t m (k) for some k, < k2~ 2m < K . 
Taking for instance S = 1/4 in Lemma 7(b), we get t m (k) - t m (k - 1) < 29 • 2" (3/2)m and 

\W(t m (k)) - W(t m (k - 1))| = \B m (k2- 2m ) - B m {{k - l)2- 2m )\ = 2- m . 

Set t* m = t m (k) if \W(t) - W{t m {k))\ > \W(t) - W(t m (k - 1))| and t* m = t m (k - 1) 
otherwise. Then \W(t) - W(t*J\ > (l/2)2' m . So \t* m - t\ < 29 • 2~^ 2 '> m ->■ and 

W(t*J - Wit) 



as m — > oo. This shows that the given sample-function cannot be differentiable at any point 

* e [o,K]. □ 

It has important consequences in the definition of stochastic integrals that, as shown 
below, the graph of a typical sample-function of the Wiener process has infinite length. In 
general, (the graph of) a function / defined on an interval [a, b] has finite length (or / is 
said to be of bounded variation on [a, b}) if there exists a finite constant c such that for 
any partition a = Xo < X\ < • • ■ < x n _i < x n = b, the sum of the absolute values of the 
corresponding changes does not exceed c: 

x:im-)-m-i)i<c. 

The smallest c with this property is called the total variation of / over [a, b], denoted 
V(f(t), a < t < b). Otherwise we say that the graph has infinite length, or / is of un- 
bounded variation on [a, b]. 

First let us calculate the total variation of a sample-function of B m {t) over an interval 
[0, K\. Each sample-function of B m (t) over [0, K] is a broken line that consists of K2 2m line 
segments with changes of magnitude 2~ rn . So for any sample-function of B m (t), 

V(B m (t), 0<t<K)= K2 2m 2~ m = K2 m , (80) 

which tends to infinity as m — > oo. 

Lemma 8 For any K' > 0, the sample-functions of the Wiener process over [0, K'\ have 
infinite length (i.e. are of unbounded variation) with probability 1. 



> 



(l/2)2- m 1 
29 • 2-( 3 / 2 ) m ~ 58 



2 m ' 2 -)• oo, 
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Proof. By Lemma 7, for any C > 3/2, K' > K > 0, and m > m (C, K, K') there exist 
time instants t m (k) £ [0, K'\ such that 

P {W(t m (k)) = B m (k2- 2m ), < k2~ 2m <K)>\- A{K2 2m f- c . (81) 

For each m > define the following event: 

C m = {V(W(t), < t < K') < K2 m } . 

Then C m C C m+ \ for any m > 0. 

For any sample-function oiW(t), take the partition = t m (0) < t m (l) < ■ ■ ■ < t m (K2 2m ). 
(To alleviate the notation, we suppress the dependence on u.) By (81), for any m > 
m (C, K, K'), the sum of the corresponding absolute changes is equal to K2 2m 2~ m = K2 m : 
with probability at least 1 - A(K2 2m ) 1 - c . 

This shows that then P {C m } < A(K2 2m ) 1 - c '. Take the event 

= {V(W(t), < t < K') < oo} . 

The continuity property of probability implies that P {C m } — > P {Coo} as m -)• oo, that is, 
P{C oo } = 0. □ 

The next lemma shows a certain uniform continuity property of the Wiener process. An 
interesting consequence of the lemma is that for any u > the probability that \W{t) — 
W(s)\ > u holds for some s,t G [0, if], \t — s\ < h can be made arbitrarily small if a small 
enough h is chosen. More accurately, the lemma shows that only with small probability can 
the increment of the Wiener process be larger than cy/h if the constant c is large enough. 
Now y/h is much larger than h for small values of h, so this also indicates why sample- 
functions of the Wiener process are not differentiable. At the same time it gives a rough 
measure of the so-called modulus of continuity of the process. Basically, the proof relies on 
Theorem la and Theorem 3. 

Lemma 9 For any K > 0, < 5 < 1, and u > there exists an ho(K, 5, u) > such that 

p| max \W(t) -W(s)\ > u\ < le^ {1 ~ & \ (82) 
[s,te[o,K], \t-s\Kh 1 - J - 

for all positive h < h (K, 5, u). 

Proof. First we choose a large enough C > 3/2 such that 2/ ((7—1) < 5/2. For instance, 
C= l + (6/<J) will do. 

By (54), the probability in (82) cannot exceed 

Q{K2 2n )' %15 + P ( max max \B n (t) - B n (s)\ > u - 2n2- n/2 \ , (83) 

{0<s<K—h s<t<s+h J 

for n > uq(K, 5). (Remember that 1 — (7 = — 6/5 now.) 
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By definition, B n (t) = 2- n S n (t2 2n ) for t > 0. For each s < t from [0, K\ and n > n (K, 5) 
take the integers s n = \s2 2n ] and t n = max{s n , |_£2 2n J}. (\x] denotes the smallest integer 
greater than or equal to x, while |_^J is the largest integer smaller than or equal to x.) 

Then \t n - t2 2n \ < 1 and so \S n (t n ) - S n (t2 2n )\ < 1, similarly for s n . Moreover, < 
Ui — s n < h2 2n if < t — s < h. Hence (83) does not exceed 

6(X2 2 T^ + P { <gax 2rao max ^ \S n (j + k) — S n (j)\ > 2»(« - 2n2-/ 2 ) - 2 | , (84) 

for n > Uq(K, 5). 

The distribution of S n (j + k) — S n (j) above is the same as the distribution of a random 
walk S(k), for any value of k > 0, independently of j > 0. Also, the largest possible value 
of \S(k)\ is k. Therefore by Theorem la, the inequality (5), and the crude estimate (42), 



P 



( max \S n (j + k)-S n (j)\ >2 n (u-2n2- n/2 -2-2-' n )\ 

lQ<k<h2 2n J 



< max \^l>JL^J/ 2 \ <h2 2n e^ {1 - & ' 2 \ 



Here it was assumed that 2n2 ™/ 2 + 2 • 2 n < u{\ — ^Jl — 5/2), which certainly holds if 



n > rii(K,5,u) > n (K,5). Also, we assumed that ^yl — 5/2 > 3/V2tt, see (6), which is 
true if h is small enough, depending on 5 and u. 

Consequently, applying the crude estimate (42) again for (84), we obtain 

p{ ff max \W(t)-W(s)\>u\ 

I s,te[0,K], \t~s\<h J 

< Q(K2 2n Y &/S + K2 2n h2 2n e-£ (1 ~ s/2) 

= Qe~ f {log K+2n log 2) + Khe 4n log2 ~^ . 

Now we select an integer n > rii(K,5,u) such that — |(logi^ + 2nlog2) < — The 
choice 

n = — — log K 

21og2 \2hQ B , 

will do if h is small enough, < h < h (K,5,u), so that n > ni(K,8,u) > 2. Then 

^< §21^2 (S|- lo g^) holds aswdl. 

With this n we have 4nlog2 < ^5/2 + log(fT~ 3 ), and so 

p| max \W(t) - W(s)\ > u\ 

[s,te[0,K], \t-s\<h j 

< (6 + h/K 2 )e-£ (1 - s l 
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If K > 1, then h/K 2 < 1 and (82) follows. If K < 1, the maximum in (82) cannot exceed 
the maximum over the interval [0,1]. Then taking h (K,5,u) = h (l,5,u), (82) follows 
again. □ 

7 A Preview of Stochastic Integrals 

To show how stochastic integrals come as natural tools when working with differential equa- 
tions including random effects, and what kind of problems arise when one wants to define 
them, let us start with the simplest ordinary differential equation 

x'(t) = f(t) (t>0), 

where / is a continuous function. If x(0) is given, its unique solution can be obtained by 
integration, 

x(t)-x(0) = I fis) ds it > 0). 
J o 

Now we modify this simple model by introducing a random term, very customary in 
several applications: 

x'(t) = f(t)+g(t)W'(t) (t>0), 

where / and g are continuous random functions and W'(t) is the so-called white noise process. 
Now we know from Theorem 5 that W'(t) does not exist (at least not in the ordinary sense), 
but after integration we may get some meaningful solution, 

x(t) - x(0) = f fis) ds + f gis) dWis) it > 0). 

JO JO 

The second integral here is what one wants to call a stochastic integral if it can be defined 
properly. 

A natural idea to define such a stochastic integral is to define it as a Riemann-Stieltjes 
integral [7, Chapter 6] for each sample-function separately. It means that one takes partitions 
= s < s i < • ' ' < < s n = t, and Riemann-Stieltjes sums 

f2g(u k )(W( Sk )-W(s k ^)), 
k=i 

where u k G [s k -i, s k ] is arbitrary. (We suppress the argument u that would refer to a specific 
sample-function in order to alleviate the notation.) Then one would hope that as the norm 
of the partition ||"P|| = max!< fc < n \s k — s k -i\ tends to 0, the Riemann-Stieltjes sums converge 
to the same limit when fixing a specific point u in the sample space. 

One problem is that it cannot happen to all continuous random functions g. The reason 
is that W(s) has unbounded variation over the interval [0, t] — as we saw it in Lemma 8. The 
random function g could be chosen so that a Riemann-Stieltjes sum gets arbitrary close to 
the total variation, which is oo. Naturally, this is the case with not only the Wiener process, 
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but with any process whose sample functions have unbounded variation, see e.g. [5, Section 
1-7]. 

But there is another problem connected to the choice of the points u k G [sk-i, s k ] in the 
Riemann-Stieltjes sums above. This choice unfortunately does matter, not like in the case of 
ordinary integration. The reason is again the unbounded variation of the sample-functions. 
The easiest way to illustrate it is using discrete stochastic integrals, that is, sums of random 
variables. (Such a sum is essentially the same as a Riemann-Stieltjes sum above.) 

So let So = 0, S n = J2k=i-X-k is a (simple, symmetric) random walk, just like in Section 
1. In the following examples S n will play the role of the function g(t) above, and the white 
noise process W'(t) is substituted by the increments X n . In the first case (that corresponds 
to an Ito-type stochastic integral), we define the discrete stochastic integral as YZ=i Sk-iXk- 
Observe that in this case the integrand is always taken at the left endpoint of the subintervals. 
A usual reasoning behind this is that X k gives the "new information" in each term, while 
the integrand S^i depends only on the past, that is, non- anticipating: independent of the 
future values X k , X k+1 , .... 

This discrete stochastic integral can be evaluated explicitly as 

n n 

^Sk-xXk = 5fc-i(5fc — Sk-i) 

k=l k=l 

in -i ra C2 

- 2 ^ k ~ ~ 2^ ^ ~ ~~2 ~ 2 ' 

k=l k=l 

Here we used that the first resulting sum telescopes and S 2 = 0, while each term (Sk — Sk-i) 2 
in the second resulting sum is equal to 1. The interesting feature of the result is that it 
contains the non-classical term —n/2. The "non-classical" phrase refers to the fact that 
Jq" s ds = s n 2 /2. Altogether, this formula is a special case of the important ltd formula, one 
of our main subjects from now on. 

Of course, it is also interesting to see what happens if the integrand is always evaluated 
at the right endpoints of the subintervals: 

n n 

^SkX k = SkjSk — Sk-i) 
fc=l fe=l 

= \t(sl-sU) + \±{s k -s k ^f = ^ + l. 

k=l k=l ~ ~ 

Note that the non-classical term is +n/2 here. 

Taking the arithmetical average of the two formulae above we obtain a Stratonovich-type 
stochastic integral, which does not contain a non-classical term: 

n q I q n 1 C2 

E -^^x k = Y.s{k- \)x k = |. 
k=i z k=i z z 

On the other hand, this type of integral has other disadvantages compared to the Ito-type 
one, resulting from the fact that here the integrand is "anticipating" , not independent of the 
future. 
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After showing these (and other) examples in a seminar, P. Revesz asked the question if 
there is a general method to evaluate discrete stochastic integrals of the type J2k=i f(Sk-i)X k 
in closed form, where / is a given function defined on the set of integers Z. In other words, 
does there exist a discrete Ito formula in general? The answer is yes, and fortunately it is 
quite elementary to see. 

But before turning to this, let us see the relationship of such a formula to an alternative 
way of defining certain stochastic integrals. This important type of stochastic integrals is 
Jo^ f(W(s)) dW(s), where K > and / is a continuously differentiable function. In other 
words, the integrand is a smooth function of the Wiener process. The traditional definition 
of the Ito-type integral in this case goes quite similarly to the Riemann-Stieltjes integral. 

Take an arbitrary partition V = {0 = s , Si, . . . , s n _i, s n = K} on the time axis, and a 
corresponding Riemann-Stieltjes sum, evaluating the function always at the left endpoints 
of the subintervals, 

E/(W(s*-i)) (W( Sk )-W(s k ^)). 

k=l 

This sum is a random variable, corresponding to the given partition. It can be proved that 
these random variables converge e.g. in probability to a certain random variable J, as the 
norm of the partition goes to 0. This random variable I is then called the Ito integral. We 
mention that "in probability" convergence means that for any e > there exists a 5 > 
such that 

p{|J-E/Wsfc-i)) (W(s k )-W(s k ^))\ >e| <e, 

as < 5. 

The alternative method that we will follow in this paper is better suited to the rela- 
tionship between the Wiener process and random walks discussed above. Mathematically, 
it somewhat reminds a Lebesgue-Stieltjes integral [7, Chapter 11]. The idea is that we 
first take a dyadic partition on the spatial axis, each subinterval having the length 2~ m , 
where m is a non-negative integer. Then we determine the corresponding first passage times 
s m (l), s m(2), ... of the Skorohod imbedding as explained above. These time instants can be 
considered as a random partition on the time axis that in general depends on the considered 
sample-function. 

By Lemma 7b and Theorem 4, with probability 1, for any K' > and for all but finitely 
many m, each s m (k) lies in the interval [0, K'\ and W(s m (k)) = B m (k2- 2m ), < k2~ 2m < K. 
The shrunk random walk B m {t) can be expressed in terms of ordinary random walks by (40) 
as B m (k2~ 2m ) = 2~ m S m (k). Now our definition of the Ito integral will be 

K2 2m 

Jm £ f(W(s m (k - 1))) (W(s m (k)) - W(s m (k - 1))). (85) 

k=l 

We will show later that this sum, which can be evaluated for each sample-function sepa- 
rately, converges with probability 1. Our method will be to find an other form of this sum 
by a discrete Ito formula and to apply the limit to the equivalent form so obtained. 
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8 A Discrete Ito Formula 



Let / be a function denned on the set of integers Z. First we define trapezoidal sums of / by 

TU fU) = |^/(0) + + ^/(*)| > (86) 

where k G Z (so k can be negative as well!) and 

1 if k > 

e fc = <{ if k = (87) 
-1 if fc < . 

The reason behind the —1 factor when k < is the analogy with integration: when the upper 
limit of the integration is less than the lower limit, one can exchange them upon multiplying 
the integral by —1. 

The next statement that we will call a discrete ltd formula is a purely algebraic one. It 
is reflected by the fact that though we will apply it exclusively for random walks, the lemma 
holds for any numerical sequence X r = ±1, irrespective of any probability assigned to them. 

Lemma 10 Take any function f defined on Z ; any sequence X r = ±1 (r > 1), and let 
5*0 = 0, S n = X 1 + X 2 + ■ ■ ■ + X n (n> 1). Then the following statements hold: 
DISCRETE Ito formula 

1 ^ f(S r ) - f(Sr-l) 



2&/0') = E/(£--i)*r + £E 



r=l 2 r=1 X r 

and 

DISCRETE STRATONOVICH FORMULA 

Th M = t f ^\ +f ^ x r . 

r=l Z 

Proof. By the definition of a trapezoidal sum, 

7?=o fU) - TfZo 1 f(j) = X /(^-i) + /(gr) > (g8) 



since if S r — S r -i = X r equals 1, one has to add a term (f(S r -i) + f(S r ))/2, while if X r = — 1, 
one has to subtract this term. 

Since X r = ±1, the right hand side of (88) can be written as 

Tfl Q f(j) - If- f(j) = /(Sr^Xr + l /W-W-l) ^ (g9) 
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By summing up (89), respectively (88), for r = 1,2, ... ,n we obtain the statements of 
the lemma, since the sum telescopes and T?° f(j) = 0: 

n 

E (7?=o fU) - TjtE 1 m) = T?=o m- 

r=l 

□ 

We need a version of Lemma 10 that can be applied for shrunk random walks B m (t) as 
well. Therefore we define trapezoidal sums of a function / over an equidistant partition with 
points x = jAx, where Ax > and j changes over the set of integers Z. Here the function / 
is assumed to be defined on the set of real numbers R. So a corresponding trapezoidal sum 
is 

fi (|a|/Ax)-l 1 ] 

T; =0 f(x) Ax = e a Ax j -/(0) + £ f(e a jAx) + -f(a) J , (90) 

where a is assumed to be an integer multiple of Ax and e a is defined according to (87). In the 
sequel this definition will be applied with Ax = 2~ m . We write the corresponding version of 
Lemma 10 directly for shrunk random walks B m (t), though this lemma is of purely algebraic 
nature as well. 

Lemma 11 Take any function f defined on R 7 any real K > 0, and fix a non-negative 
integer m. Consider shrunk random walks B m (r2~ 2m ) = 2~ m S m (r) (r > 0). Then the 
following statements hold (Ax = 2~ m , At = 2~ 2m ): 
It 6 case 

[K/At\ 

TjL m f(x) Ax = ]T f(B m ((r - I) At)) {B m {rAt) - B m {{r - l)At)) 

r=l 

1 lK ^ tl f(B m (rAt))-f(B m ((r-l)At)) 

2 B m (rAt) - B m ((r - l)At) ' 1 ) 

and 

Stratonovich case 
T B^{K m ) Ax 

= L f J /W - 1) At)) ± /( Bm(r m (Bm{rM) _ _ 1)Ai))< (92) 

Proof. The proof is essentially the same as in case of Lemma 10, therefore omitted. □ 

Now recall Lemma 7b and Theorem 4. With probability 1, for any K' > K and for all 
but finitely many m there exist random time instants s m (r) G [0, K'\ (the first passage times 
of the Skorohod imbedding) such that W(s m (r)) = B m (rAt) and 



max 

0<rAt<K 



\s m (r) - rAt\ < V27 'Km 2~ m , (93) 



40 



going to as m — > oo. 

In this light the shrunk random walks B m (t) can be replaced by the Wiener process in 
(91) and (92). Then the first sum on the right hand side of (91) becomes exactly the one 
whose limit as m — > oo is going to be our definition of Ito integral by (85). Similarly, the right 
hand side of (92) is the one whose limit will be our definition of the Stratonovich integral. 

The most important feature of Lemma 11 is that these limits can be evaluated in terms 
of limits of other, simpler sums. An other gain is that after performing the limits, we will 
immediately obtain the important Ito and Stratonovich formulae for the corresponding types 
of stochastic integrals. 



9 Stochastic Integrals and the Ito formula 

Theorem 6 Let f be a continuously differentiable function on the set of real numbers R 7 
and K > 0. For m > and k > take the first passage times s m (k) of the Skorohod 
imbedding of shrunk random walks into the Wiener process as defined by (65). Then the 
sums below converge with probability 1: 
Ito integral 

„K K2 2m 

/ f(W(s)) dW(s) = Jm £ f(W(s m (r - 1))) (W(s m (r)) - W(s m (r - 1))), (94) 

and 

Stratonovich integral 
K f(W(s)) o dW(s) 

= JS, E HWMr ~T WO) - W M r -!))). ,95) 

r=l Z 

For the corresponding stochastic integrals we have the following formulae as well: 
Ito formula 

/ f(x) dx = / f(W(s)) dW{s) + - / f{W{s)) ds, (96) 
jo jo 2 jo 

and 

Stratonovich formula 

/ f{x)dx= / f(W(s)) o dW(s). (97) 

JO JO 

PROOF. By the Ito case of Lemma 11 and the comments made after lemma, with 
probability 1, for all but finitely many m, we have the next equation for the sum in (94): 

^ fJWMr - 1 M ± fJWMrW msm(r)) _ w(s Jr _ m 

r=l 1 

„ rpW(sm([K/At})) f( , Ar 1 lK ^ tl f(W(s m (r)))-f(W(s m (r-l))) Af (qH) 
}{X)AX 2 U W{s m {r))-W{s m {r-l)) ^ (98) 
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where Ax = 2~ m and At = 2~ 2m 

For t G [0, K] set t m = [t/At\At. Then \t - t m \ < At = T 2m . By (93), \t m - 
s m (|f/At|)| < V27Km 2- m with probability 1 if m is large enough. This implies that 

max \ t - Sm ([t/At\)\^0 (99) 

with probability 1 as m — > oo. Further, the sample functions of the Wiener process being 
uniformly continuous on [0, K'\ with probability 1, one gets that then 

max \W(t)-W(s m ([t/At\))\ ^0 (100) 

as well. 

Particularly, it follows that W(s m ([K/ At\)) — > W(K) with probability 1 as m — > oo. On 
the other hand, the trapezoidal sum T" =0 f(x) Ax of a continuous function / is a Riemann 
sum corresponding to the partition jo, |Ax, §Ax, . . . , a — §Ax, a — |Ax, a j. Therefore the 
trapezoidal sums converge to J° =0 /(x) dx as Ax — >■ 0. These show that for any e > 0, 



L 



o 

/ ,, ' (S "' ,l ' ^/a ' J,, /(,) «fa - e Wi ' J) ' /(*) A, + /" l " J /(.) dx 

JO ' Jw<s m (\K/At\)) 



< 



•W(K) 
IW(s m ([K/At})) 



< e/2 + e/2 = e 



with probability 1 if m is large enough. That is, the trapezoidal sum in (98) tends to the 
corresponding integral with probability 1: 

ia*' 4 " 1 ' /<*) a, = [ lK) m dx. (ioi) 

Now let us turn to the second sum in (98). By the definition of the first passage times, 
W(s m (r)) — W(s m (r — 1)) = ±2~ m = ±Ax, which tends to as m — > oo. Hence 

f(W(s m (r))) - f(W(s m (r - 1))) _ f(W(s m (r)) T Ax) - f(W(s m (r))) . , 

W{s m {r))-W{s m {r-1)) T Ax ' 

We want to show that this difference quotient gets arbitrarily close to f'(W(rAt)) if m is 
large enough. 

To this end, let us consider the following problem from calculus. If / is a continuously 
differentiable function, x m — > x and Ax m — > as m — > oo, let us consider the difference of 
f'(x) and (f(x m + Ax m ) — f(x m ))/Ax m . By the mean value theorem, the latter diference 
quotient is equal to f'(u m ), where u m lies between x m — > x and x m + Ax m — > x. Since /' is 
continuous, this implies that 

f(x m + AXrn)-f(x m )) ^ (1Q3) 
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as m — > oo. 

In our present context, x = W(t) and x m = W(s m ([t/At\)), where < t < K. Since 
the sample functions of W(t) are continuous with probability 1, it follows from the max-min 
theorem that their ranges are contained in bounded intervals. Over such a bounded interval 
the function /' is uniformly continuous, therefore (99), (100), and (103) imply 



max 

0<t<K 



f(W(t)) 



f(W(s m ([t/At\)) T Ax) - f(W(s m ([t/At\))) 



->■ 



(104) 



with probability 1 as m — > oo. (Remember that now Ax = 2 m and At = 2 2m .) 
Particularly, for any e > 0, we have 



max 

KrAKK 



max 

KrAKK 



f(W(s m (r)))-f(W(s m (r-l))) 



W(s m (r))-W(s m (r-1)) 
f(W(s m (r))TAx)-f(W(s m (r))) 



TAx 



f'(W(rAt)) 
- f(W(rAt)) 



< 3K ^ 



with probability 1 assuming m is large enough. 

The function f'(W(s)) is continuous with probability 1, so its Riemann sums over [0, K] 
converge to the corresponding integral as the norm of the partition tends to 0. Thus by 
(105), 



[K/At\ 

< E 

r=l 



f'(W(rAt)) 



W(s m (r))-W(s m (r-1)) 
f(W(s m {r))^Ax)-f{W{s m (r))) 



At 



+ 



< 



/ f(W(s))ds- ]T f'(W(rAt))At 



+ 



K 



K„ 



f'(W(s)) ds 



K H V - 

3K 3 3 



with probability 1 if m is large enough. Here K m = \_K/At\At. 

Therefore the second sum in (98) also tends to the corresponding integral with probability 

1: 

l r K 



Um 1 lK ^ tl f(WMr))) - fjWjsrnir - 1))) 
m ->°°2 W(s m (r))-W(s m (r-1)) 2 Jo 



f(W(s)) ds. 



W(s m (r))-W(s m (r-1)) 

This proves that the defining sum of the Ito integral in (94) converges with probability 1 
as m — > oo, and for the limit we have Ito formula (96). 

Also, by the Stratonovich case of Lemma 11 and the comments made after the lemma, 
with probability 1, for all but finitely many m, we have the following equation for the sum 
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in (95): 



[K/At\ 



f(W(s m (r-l))) + f(W(s m (r))) 



E 



2 



(W(s m (r)) - W(s m (r - 1))) 



= T 

-*- o 



f(x)Ax. 



x=0 



We saw in (101) that this trapezoidal sum converges to the corresponding integral with 
probability l as m — > oo. Therefore the defining sum of the Stratonovich integral in (95) 
converges as well, and for the limit we have formula (97). □ 

Since the Ito and Stratonovich formulae are valid for the usual definitions of the corre- 
sponding stochastic integrals as well, this shows that the usual definitions agree with the 
definitions given in this paper. 

As we mentioned in a special case, the interesting feature of Ito formula (96) is that 
it contains the non-classical term | j K f'(W(s)) ds. If g denotes an antiderivative of the 
function /, then the Ito formula can be written as 



We mention that other, more complicated versions of Ito formula can be proved by 
essentially the same method, see [8]. Also, as shown there, multiple stochastic integrals can 
be defined analogously as the stochastic integrals defined above. 
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